Define a train/test threshold:
ex: 80% train - 20% test
In [6]:
split = .8
Specify your input file:
In [7]:
file_origin = "/path/to/your/input/dataset.csv"
Specify your ouput files:
In [ ]:
file_train = "/path/to/your/output/train_dataset.csv"
file_test = "/path/to/your/output/test_dataset.csv"
Execute the following python code:
In [3]:
from random import random
In [8]:
with open(file_train, 'w') as train,\
open(file_test, 'w') as test,\
open(file_origin) as origin:
for line in origin:
rand = random()
if rand < split:
train.write(line)
else:
test.write(line)